Add task.verify, dynamic Stripe keys, and capability warnings#3
Conversation
…ing v0.2 follow-ups
task.verify (agentcli#2):
- New verify field on tasks: shell command runs after completion to confirm outcome
- verify.shell (required), verify.timeout_seconds (default 30), verify.on_failure (error|warn)
- Full pipeline: validation, schema, compiler, exec (v1+v2), scheduler fields
- 19 new tests
Dynamic Stripe key minting:
- Implement key_strategy: "dynamic" in stripe-api-key provider
- Mint restricted keys via POST /v1/api_keys with master key auth
- Cleanup via DELETE /v1/api_keys/{id} (best-effort)
- Dynamic prepareHandoff mints narrower keys for child scope
- 45 new tests with mock HTTP server
Capability warnings:
- validateManifestCapabilities returns { errors, warnings } instead of flat array
- Soft warnings for runtime_identity_resolution and credential_handoff (downscope)
- Updated all callers (apply.js, cli.js, runtime adapter)
- 6 new tests
DRY dedup: already resolved (schedulerCreateSpec shared via apply.js import)
…ation Add architecture docs explaining the credential flow control surfaces, child credential policy enforcement model, and separation between agentcli (control plane) and scheduler (execution runtime) roles.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a task-level post-success "verify" step (schema, validation, compile, runtime), surfaces capability warnings alongside errors, implements Stripe-backed dynamic restricted API key minting/handoff with cleanup, and documents scheduler/child trust boundaries and execution/control-plane identity separation. Changes
Sequence Diagram(s)sequenceDiagram
participant Scheduler as Scheduler
participant Executor as Task Runner
participant Verify as Verify Runner
participant Audit as Audit Logger
participant Stripe as Stripe API
Scheduler->>Stripe: (if dynamic) POST create restricted key (mint)
Stripe-->>Scheduler: return key id/secret/expires
Scheduler->>Executor: dispatch compiled job (includes verify, materialization/handoff metadata, narrowed creds)
Executor->>Executor: run main task command (spawn)
Executor->>Executor: capture exit_code, stdout, stderr
alt exit_code == 0 and verify present
Executor->>Verify: runVerify(shell, timeout_seconds)
Verify->>Verify: spawnSync('sh', ['-c', ...]) with timeout
Verify-->>Executor: {passed, exit_code, stdout, stderr, timed_out, duration_ms}
alt verify.passed or verify.on_failure == 'warn'
Executor->>Audit: append verify result (warning if applicable)
else
Executor->>Executor: mark verify_failed, effectiveOk=false
Executor->>Audit: emit failure audit (includes verify result)
Executor->>Scheduler: propagate verify_failed error
end
else
Executor->>Audit: log main execution result (no verify)
end
opt cleanup/handoff for dynamic keys
Scheduler->>Stripe: DELETE restricted key (best-effort revoke)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
Note 🎁 Summarized by CodeRabbit FreeYour organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login. Comment |
There was a problem hiding this comment.
Pull request overview
Adds post-run verification for tasks, implements dynamic Stripe restricted-key minting with cleanup, and enhances capability checking to return both hard errors and soft warnings (plus updated trust-boundary documentation).
Changes:
- Introduce
task.verify/workflow.verifyvalidation, compilation to scheduler fields, and execution-time verify behavior. - Implement
stripe-api-keyproviderkey_strategy: "dynamic"with Stripe API mint/revoke helpers and extensive tests. - Update capability validation to return
{ errors, warnings }and surface warnings in CLI/apply paths; expand architecture/security docs.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| test/agentcli.test.js | Updates capability tests to new return shape and adds extensive dynamic Stripe + verify test coverage. |
| src/validate.js | Adds verify to known keys and validates verify blocks for workflows/tasks. |
| src/schema.js | Extends manifest schema with verify fields and adds scheduler job verify columns. |
| src/scheduler-fields.js | Adds verify fields to v0.2 scheduler field list. |
| src/runtime/openclaw-scheduler.js | Adapts to { errors, warnings } return shape (uses errors). |
| src/identity/stripe-api-key.js | Implements dynamic key minting + cleanup, adds helpers/constants, exports for tests. |
| src/exec.js | Adds verify execution phase and returns/audits verify results; verify failures can flip ok / throw. |
| src/compiler/shared.js | Adds resolveVerify() and includes verify in normalized task plans. |
| src/compiler/openclaw-scheduler.js | Compiles verify settings to scheduler job fields. |
| src/cli.js | Surfaces capability warnings in --check-capabilities output. |
| src/capabilities.js | Changes API to return { errors, warnings } and adds new warning checks. |
| src/apply.js | Prints capability warnings to stderr during apply while still gating on errors. |
| docs/execution-identity.md | Adds security-model section describing principal separation/trust boundary. |
| docs/architecture.md | Documents scheduler/child trust boundary and credential strategy model. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ey guard - Embed session in materialize result so cleanup() can access stripe_key_id without callers needing to thread session through ctx - Restrict http:// api_base to localhost unless allow_insecure_http is set, preventing accidental credential leak over plain HTTP - Add min: 1 to verify_timeout_s in published schema - Guard resolveMasterKey against missing/non-object master_key_source
|
Follow-up review pass pushed in e0e1e46. Addressed two execution-path regressions in this PR:
Also wired task timeout into dynamic session resolution and documented the new workflow/task verify block in the field reference. Local verification: npm test, npm run lint. |
|
Superseded by the later follow-up comments on this PR. The previous body was malformed by shell escaping; the current review summary is in the newer owner comments and review notes. |
amittell
left a comment
There was a problem hiding this comment.
Validation
- Ran
npm testlocally on the PR head: all tests passed (fullnode --testsuite including new coverage fortask.verify, Stripe dynamic keys, and capability warnings).
Summary
Solid, cohesive v0.2 follow-up: post-exec verify, dynamic Stripe restricted-key minting with cleanup and HTTP hardening, capability negotiation split into hard errors vs soft warnings, plus helpful architecture / trust-boundary docs. Tests are extensive and match the behavioral contracts described in commit messages.
Strengths
- Verify pipeline is wired end-to-end: validation, schema (
min: 1on timeout),resolveVerifymerge (task overrides workflow), compiler → scheduler fields, and both v0.1 / v0.2 exec paths with correct cwd/env (including materialized identity env in v0.2). - Stripe
dynamic: master key resolution is guarded;api_baseblocks non-localhttp:unless opted in; mint/delete paths are tested with a mock HTTP server; materialization embeds session socleanup()can revoke keys without threading extra context. - Capabilities: returning
{ errors, warnings }is a clean upgrade;--check-capabilitiesexposes structured warnings for automation. - Docs clearly separate agentcli control plane vs scheduler runtime — good for security reviews.
Follow-ups (non-blocking; also left inline)
- Evidence vs
verify(v0.2): Evidence is attested before the verify shell runs, so envelopes can show success even when verify later fails. Document or adjust if evidence is meant to cover “verify included” outcomes. apply --json: Capability soft-warnings only hit stderr; consider surfacing them in the JSON payload like--check-capabilitiesdoes.- Stripe: Update stale
resolveSessionJSDoc that still says dynamic is unimplemented. - Capabilities: Handoff vs
downscopewarning/error asymmetry is intentional but subtle — worth a sentence in field-reference.
Overall: No test regressions found; approve-worthy from a correctness/CI perspective with the above documentation/product clarifications as polish.
…c, capability cross-link - Document evidence vs verify ordering in exec.js (evidence attests command outcome, verify is a separate operator-local deliverable check) - Include capability warnings in apply JSON output (capabilities.warnings) so automation parsing stdout gets structured warning data - Fix stale JSDoc claiming dynamic strategy is not implemented - Add cross-link comment explaining hard error (presentation.handoff) vs soft warning (child_credential_policy downscope) asymmetry
|
One more follow-up commit is on the PR branch: e8a5808. I found a second-order contract issue in the new verify path: the main task respected strict sandbox/network enforcement, but the post-success verify command was still shelling out directly. That meant verify could bypass the task contract on supported local sandbox runners. This is now fixed by running verify through the same sandbox wrapper/profile when strict enforcement is active, with a regression test covering the darwin sandboxed path. Validation rerun after the update: npm test, npm run lint. |
|
Polish follow-up pushed in This pass does not change runtime behavior. It tightens the review surface around the current PR head:
Validation rerun on this head: |
|
Post-merge assessment on PR #3 (
I pushed a small follow-up branch with both fixes:
Validation on that branch:
The new regression test covers delegated |
Summary
verifyfield on tasks runs a shell command after completion to confirm the expected outcome exists. Supportson_failure: error|warnand configurable timeout.key_strategy: "dynamic"in the stripe-api-key provider now mints restricted keys via the Stripe API with automatic cleanup. Includes scope-based permission mapping and expiry management.validateManifestCapabilitiesnow returns{ errors, warnings }. Soft warnings surface when manifests useruntime_identity_resolutionorcredential_handoff(downscope) against schedulers that don't advertise those features.Test plan
Summary by CodeRabbit
New Features
Documentation
verifyfield documented in field reference.Improvements
Tests